Efficient Reinforcement Learning in Factored MDPs
نویسندگان
چکیده
We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN). Our algorithm generalizes the recent E3 algorithm of Kearns and Singh, and assumes that we are given both an algorithm for approximate planning, and the graphical structure (but not the parameters) of the DBN. Unlike the original E3 algorithm, our new algorithm exploits the DBN structure to achieve a running time that scales polynomially in the number of parameters of the DBN, which may be exponentially smaller than the number of global states.
منابع مشابه
Efficient Structure Learning in Factored-State MDPs
We consider the problem of reinforcement learning in factored-state MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given their structure to the case in which DBN structure is not known in advance. Our method learns th...
متن کاملTeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs
Reinforcement learning is one of the main adaptive mechanisms that is both well documented in animal behaviour and giving rise to computational studies in animats and robots. In this paper, we present TeXDYNA, an algorithm designed to solve large reinforcement learning problems with unknown structure by integrating hierarchical abstraction techniques of Hierarchical Reinforcement Learning and f...
متن کاملEfficient Abstraction Selection in Reinforcement Learning
This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of candidate abstractions, each build up from a different combination ofions, each build up from a differen...
متن کاملChi-square Tests Driven Method for Learning the Structure of Factored MDPs
sdyna is a general framework designed to address large stochastic reinforcement learning (rl) problems. Unlike previous model-based methods in Factored mdps (fmdps), it incrementally learns the structure of a rl problem using supervised learning techniques. spiti is an instantiation of sdyna that uses decision trees as factored representations. First, we show that, in structured rl problems, sp...
متن کاملSample Efficient Feature Selection for Factored MDPs
In reinforcement learning, the state of the real world is often represented by feature vectors. However, not all of the features may be pertinent for solving the current task. We propose Feature Selection Explore and Exploit (FS-EE), an algorithm that automatically selects the necessary features while learning a Factored Markov Decision Process, and prove that under mild assumptions, its sample...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999